Online Approximate Matching with Non-local Distances

نویسندگان

  • Raphaël Clifford
  • Benjamin Sach
چکیده

A black box method was recently given that solves the problem of online approximate matching for a class of problems whose distance functions can be classified as being local. A distance function is said to be local if for a pattern P of length m and any substring T [i, i+m−1] of a text T , the distance between P and T [i, i + m − 1] is equal to Σj∆(P [j], T [i+ j − 1]), where ∆ is any distance function between individual characters. We extend this line of work by showing how to tackle online approximate matching when the distance function is non-local. We give solutions which are applicable to a wide variety of matching problems including function and parameterised matching, swap matching, swap-mismatch, k-difference, k-difference with transpositions, overlap matching, edit distance/LCS, flipped bit, faulty bit and L1 and L2 rearrangement distances. The resulting unamortised online algorithms bound the worst case running time per input character to within a log factor of their comparable offline counterpart.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern matching in pseudo real-time

It has recently been shown how to construct online, non-amortised approximate pattern matching algorithms for a class of problems whose distance functions can be classified as being local. Informally, a distance function is said to be local if for a pattern P of lengthm and any substring T [i, i+m−1] of a text T , the distance between P and T [i, i+m− 1] can be expressed as Σj∆(P [j], T [i+ j])...

متن کامل

Approximate Regular Expression Matching

We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Error Tree: A Tree Structure for Hamming & Edit Distances & Wildcards Matching

Error Tree is a novel tree structure that is mainly oriented to solve the approximate pattern matching problems, Hamming and edit distances, as well as the wildcards matching problem. The input is a text of length n over a fixed alphabet of length Σ, a pattern of length m, and k. The output is to find all positions that have ≤ k Hamming distance, edit distance, or wildcards matching with P . Th...

متن کامل

Practical Methods for Approximate String Matching

Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009